196 research outputs found

    On the Feasibility of Automated Detection of Allusive Text Reuse

    Full text link
    The detection of allusive text reuse is particularly challenging due to the sparse evidence on which allusive references rely---commonly based on none or very few shared words. Arguably, lexical semantics can be resorted to since uncovering semantic relations between words has the potential to increase the support underlying the allusion and alleviate the lexical sparsity. A further obstacle is the lack of evaluation benchmark corpora, largely due to the highly interpretative character of the annotation process. In the present paper, we aim to elucidate the feasibility of automated allusion detection. We approach the matter from an Information Retrieval perspective in which referencing texts act as queries and referenced texts as relevant documents to be retrieved, and estimate the difficulty of benchmark corpus compilation by a novel inter-annotator agreement study on query segmentation. Furthermore, we investigate to what extent the integration of lexical semantic information derived from distributional models and ontologies can aid retrieving cases of allusive reuse. The results show that (i) despite low agreement scores, using manual queries considerably improves retrieval performance with respect to a windowing approach, and that (ii) retrieval performance can be moderately boosted with distributional semantics

    Character-level Transformer-based Neural Machine Translation

    Full text link
    Neural machine translation (NMT) is nowadays commonly applied at the subword level, using byte-pair encoding. A promising alternative approach focuses on character-level translation, which simplifies processing pipelines in NMT considerably. This approach, however, must consider relatively longer sequences, rendering the training process prohibitively expensive. In this paper, we discuss a novel, Transformer-based approach, that we compare, both in speed and in quality to the Transformer at subword and character levels, as well as previously developed character-level models. We evaluate our models on 4 language pairs from WMT'15: DE-EN, CS-EN, FI-EN and RU-EN. The proposed novel architecture can be trained on a single GPU and is 34% percent faster than the character-level Transformer; still, the obtained results are at least on par with it. In addition, our proposed model outperforms the subword-level model in FI-EN and shows close results in CS-EN. To stimulate further research in this area and close the gap with subword-level NMT, we make all our code and models publicly available

    Advances in Distant Diplomatics: A Stylometric Approach to Medieval Charters

    Get PDF
    The quantitative analysis of writing style (stylometry) is becoming an increasingly common research instrument in philology. When it comes to medieval texts, such a methodology might be able to help us disentangle the multiple authorial strata that can often be discerned in them (issuer, dictator, scribe, etc.). To deliver a proof of concept in 'distant diplomatics,' we have turned to a corpus of twelfth-century Latin charters from the Cambrai episcopal chancery. We subjected this collection to an (unsupervised) stylometric modelling procedure, based on lexical frequency extraction and dimension reduction. In the absence of a sizable 'ground truth' for this material, we zoomed in on a specific case study, namely the oeuvre of the previously identified dictator-scribe known as 'RogF/JeanE.' Our results offer additional support for the attribution of a diplomatic oeuvre to this individual and even allow us to enlarge it with additional documents. Our analysis moreover yielded the serendipitous discovery of another, previously unnoticed, oeuvre, which we tentatively attribute to a scribe-dictator 'JeanB.' We conclude that the large-scale stylometric analysis is a promising methodology for digital diplomatics. More efforts, however, will have to be invested in establishing gold standards for this method to realize its full potential

    From exemplar to copy: the scribal appropriation of a Hadewijch manuscript computationally explored

    Full text link
    This study is devoted to two of the oldest known manuscripts in which the oeuvre of the medieval mystical author Hadewijch has been preserved: Brussels, KBR, 2879-2880 (ms. A) and Brussels, KBR, 2877-2878 (ms. B). On the basis of codicological and contextual arguments, it is assumed that the scribe who produced B used A as an exemplar. While the similarities in both layout and content between the two manuscripts are striking, the present article seeks to identify the differences. After all, regardless of the intention to produce a copy that closely follows the exemplar, subtle linguistic variation is apparent. Divergences relate to spelling conventions, but also to the way in which words are abbreviated (and the extent to which abbreviations occur). The present study investigates the spelling profiles of the scribes who produced mss. A and B in a computational way. In the first part of this study, we will present both manuscripts in more detail, after which we will consider prior research carried out on scribal profiling. The current study both builds and expands on Kestemont (2015). Next, we outline the methodology used to analyse and measure the degree of scribal appropriation that took place when ms. B was copied off the exemplar ms. A. After this, we will discuss the results obtained, focusing on the scribal variation that can be found both at the level of individual words and n-grams. To this end, we use machine learning to identify the most distinctive features that separate manuscript A from B. Finally, we look at possible diachronic trends in the appropriation by B's scribe of his exemplar. We argue that scribal takeovers in the exemplar impacts the practice of the copying scribe, while transitions to a different content matter cause little to no effect

    DHBeNeLux : incubator for digital humanities in Belgium, the Netherlands and Luxembourg

    Get PDF
    Digital Humanities BeNeLux is a grass roots initiative to foster knowledge networking and dissemination in digital humanities in Belgium, the Netherlands, and Luxembourg. This special issue highlights a selection of the work that was presented at the DHBenelux 2015 Conference by way of anthology for the digital humanities currently being done in the Benelux area and beyond. The introduction describes why this grass roots initiative came about and how DHBenelux is currently supporting community building and knowledge exchange for digital humanities in the Benelux area and how this is integrating regional digital humanities in the larger international digital humanities environment

    Collaborative authorship in the twelfth century: a stylometric study of Hildegard of Bingen and Guibert of Gembloux

    Get PDF
    Abstract – Hildegard of Bingen (1098–1179) is one of the most influential female authors of the Middle Ages. From the point of view of computational stylistics, the oeuvre attributed to Hildegard is fascinating. Hildegard dictated her texts to secretaries in Latin, a language of which she did not master all grammatical subtleties. She therefore allowed her scribes to correct her spelling and grammar. Especially Hildegard’s last collaborator, Guibert of Gembloux, seems to have considerably reworked her works during his secretaryship. Whereas her other scribes were only allowed to make superficial linguistic changes, Hildegard would have permitted Guibert to render her language stylistically more elegant. In this article, we focus on two shorter texts: the Visio ad Guibertum missa and Visio de sancto Martino, both of which Hildegard allegedly authored during Guibert’s secretaryship. We analyse a corpus containing the letter collections of Hildegard, Guibert and Bernard of Clairvaux using a number of common stylometric techniques. We discuss our results in the light of the Synergy Hypothesis, suggesting that texts resulting from collaboration can display a style markedly different from that of the collaborating authors. Finally, we demonstrate that Guibert must have reworked the disputed visionary texts allegedly authored by Hildegard to such an extent that style-oriented computational procedures attribute the texts to Guibert
    • …
    corecore